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ABSTRACT 


Network emulation is a valuable, though potentially resource intensive, method for vir¬ 
tualizing networks for analysis or testing. Though high-powered servers are becoming 
increasingly accessible, the size and complexity of physical networks have increased in a 
similar fashion, thereby limiting the type and size of networks that can be emulated on a 
single physical machine. In this thesis, we present a tool that allows the developers of ground 
truth topologies to distribute the emulation requirements across multiple physical machines, 
thereby increasing the size of networks that can be emulated. First, we reexamine existing 
tools to discover current methods for emulating synthetic and physical networks. Then we 
modify an existing platform to enable execution on multiple machines, while increasing 
flexibility for future extensions. We then develop methods for efficiently distributing the 
topology among the available resources in order to maximize the potential scale. Finally, 
we run a series of scenarios simulating real world events, such as a Border Gateway Protocol 
(BGP) hijack attack, in order to demonstrate the utility and efficiency of the system. 
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CHAPTER 1: 
Introduction 


Virtual networks, both emulated and simulated, are important tools for network research. 
For example, a network researcher may be interested in analyzing how a worm propagates 
through a network, or a network administrator may want to view policy change impacts prior 
to deployment. In both instances, best practices dictate that a non-production network is 
required to facilitate such exploration and understanding without impacting the production 
network. Additionally, in order to have reproducible results, a known network, which can 
be easily and identically recreated, is necessary. Often, a large network is required to fully 
analyze a problem, however creating a physical test bed of non-trivial size is frequently time 
and cost prohibitive. To circumvent these constraints, researchers or administrators can turn 
to network simulation or network emulation. 

As the size and scope of real world networks increases, the resources required to efficiently 
execute emulations and simulations of these networks also increases. Though available 
computing power of commodity hardware continues to grow, the scale and complexity of 
networks is also increasing at a commensurate rate. Clearly, we do not have access to 
unlimited resources. 

We term a collection of routers and their interconnection a network “topology.” Many 
problems, such as the analysis of Border Gateway Protocol (BGP) behavior, a subject we 
examine as a case study in this thesis, can benefit from large interconnected topologies. The 
real-world complexity of a protocol such as BGP may be better analyzed using an emulated 
topology versus a simulated topology. The development of emulation architectures that can 
scale to the size of the network topology and efficiently use the resources available is a 
challenging task and is the focus of this thesis. 

Administrators aim to simplify development, expedite execution, and mitigate risk in the 
testing and analysis of their networks. Virtualized networks are often for these tasks [1]. 
Within a controlled environment, modifications to a network, protocol, or model can be 
executed - and the effects measured - without affecting real-world users or services. In a 
similar vein, researchers also require large networks with which they can analyze, modify. 


1 




or develop against [1]. Developing a ground-truth network topology to determine expected 
behaviors reduces the complexity of analysis and reduces the potential for unexpected 
side effects. Additionally, using virtual networks allows a large space of scenarios, e.g., 
traffic models or faults, to be executed in an automated fashion, thereby improving the 
thoroughness of the analysis. Automating the model creation, network creation, and analysis 
can greatly improve the efficiency of scenario development and enables common practices 
like Monte Carlo analysis [2]. For example, Costatino et al. used both network simulation 
and Monte Carlo analysis to produce their results when analyzing performance of Long 
Term Evolution (LTE) gateways [3]. 

Many research scenarios allow problems to be represented mathematically using formally 
defined models. Eor example, one could represent a graph using a simple adjacency 
matrix and execute any number of previously defined algorithms against said matrix. In 
these instances network simulation can provide an efficient method for executing these 
mathematical models, and can be an efficient way to test and analyze newly developed 
scenarios. Complex behaviors can be programmatically executed, and efficiently simulated 
- often using a single physical computer. 

While simulation is a valuable tool, further specificity may be desired and simulators may 
not capture the true, nuanced behavior of actual and deployed systems. Eor example, admin¬ 
istrators of large networks may enforce routing policies to maximize profits. Such policies 
may contradict expected protocol behaviors. A BGP simulator which implements behavior 
based only on the defined standard may always choose the path with the fewest Autonomous 
System (AS) hops, while an Internet Service Provider (ISP) may prefer a path with the most 
monetary gain. The Request for Comments (RPC) for BGP contains “suggested” and “may” 
behaviors, such as support for or values of BGP timers. Without visibility into a proprietary 
implementation, there is no way to know how the protocol is actually implemented. These 
values will affect network behavior such as packet propagation and convergence time, and 
are difficult to capture using a simulator. Network emulation is a technique for simulating 
network properties by mimicking specific component behavior [4]. Network emulation can 
provide real-world behavior for network scenarios and more closely copies the behavior 
of actual, physical network devices. Often, proprietary implementations of a protocol are 
a “black-box” and simulating these implementations is challenging. Emulation uses the 
actual behavior of the proprietary software and removes the attempt at copying implemen- 
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tation specifics. Additionally, versus simulation, emulation has several added benefits such 
as: allowing actual network traffic to traverse between the virtual network and real world 
networks; and deploying the actual behavior of any operating system software. 

Many emulation and simulation tools exist. Though, as the size and complexity of real 
world networks grow, the physical limitations of these tools may be reached. Emulation 
often requires resources equal to the physical hardware being emulated, plus some amount 
of overhead. This resource utilization limits the number of instances which can be em¬ 
ulated on a given physical architecture and requires careful consideration when creating 
emulated networks. Though thoughtful software implementations can mitigate the amount 
of resources required, many software packages are developed to run on a single physical 
instance, and are unable to span multiple physical machines. These limitations coupled 
together, constrain the scale to which we can emulate network topologies. 


1.1 Motivation 

The primary motivation within this thesis is to enable existing network emulators to run 
at an order of magnitude larger scale than currently possible. While much can be gained 
by conducting research on small, representative networks, in many cases constructing a 
derived network similarly sized to the desired network is preferable or necessary. With a 
surplus of computing resources available in the form of powerful servers or Infrastructure 
as a Service, we strive to remove the constraints imposed by the resources available on a 
single, physical machine or by software coupled to a single machine, providing researchers 
a valuable option for virtualizing large networks. 


1.2 Research Questions and Contributions 

In our work, we investigate these primary questions: 

1. How can we emulate networks an order of magnitude larger in size than currently 
possible? 

2. How can we best automate virtual network creation, and efficiently distribute emulated 
network elements across heterogeneous physical resources? 
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3. What is the extent to which more complex and realistic Internet-like topologies can 
be modeled on the emulated network? 

4. As a concrete application to run on the emulated network, can we model a BGP hijack 
attack? 

We contribute to and extend the state-of-the-art though the following: 

1. We develop DERIK, an automated tool that can generate a large, virtualized network 
based on an existing graph model, distribute the emulated topology across a number 
of physical servers, and extend the ability to generate analysis scenarios. 

2. We develop a linear program that optimizes the distribution of the emulated devices 
across physical machines to maximize the scale of the networks able to be emulated 

3. We demonstrate up to a 37% reduction in traffic on the physical link compared to a 
uniform distribution in our experiments. 

4. We develop a proof-of-concept BGP hijacking application, which demonstrates the 
utility and effectiveness of Distributed Emulation Router Inference Kit (DERIK). 


1.3 Thesis Structure 

This remainder of this thesis is structured as follows: 

1. In Chapter I, we discuss the application of distributed emulation, its challenges, and 
our motivations. 

2. In Chapter 2, we investigate existing network virtualization and distribution tech¬ 
niques. 

3. In Chapter 3, we describe an upgrade to an existing tool we created to create large 
scale, distributed emulated network topologies. 

4. In Chapter 4, we discuss the results we obtained from analyzing the efficiency of 
our distributions, as well as the scale to which we were able to build our emulated 
topologies. 

5. In Chapter 5, we discuss our conclusions and possibilities for future work. 
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CHAPTER 2: 

Background and Related Work 


This thesis significantly modifies and extends an existing system to enable distributed 
emulation of large-scale network topologies. We first discuss simulation, emulation, and 
methods of network virtualization. Next, we discuss the choice of systems to extend and 
our motivation for selecting a particular emulation platform. We then discuss methods used 
previously for distributing computing requirements in other software and technical systems. 
Finally, as our system uses linear programming to optimize the distribution of resources 
among physical assets, we briefly review optimization problems, and ways in which these 
methods have been applied to previous software platforms. 


2.1 Existing Network Virtualization Tools 

Many tools have been developed in an attempt to reduce setup time, reduce execution time, 
or reduction configuration required for the virtualization of networks. These tools can be 
generally segmented into two classes: simulation software and emulation software, each 
with its own subsets of capabilities and limitations. Both classes contain robust tools which 
can provide high quality virtualizations with regards to the specific desired characteristics of 
a chosen problem and can provide value to a network administrator, researcher, instructor, 
or student. The selection of one over another is dependent on the resources available and 
problem being analyzed. 

2.1.1 Simulation tools 

Simulation software can be an efficient method for inferring patterns in network models. 
Multiple layers of the network stack such as: environmental characteristics, mobility pat¬ 
terns, traffic patterns, or routing protocols can be pragmatically modeled using simulation 
software [5]-[7]. Analyzing models and patterns in virtual networks can be performed at 
several levels of abstraction using simulation methods. In the simplest form of simula¬ 
tion, mathematical or analytical models can be created and analyzed using software such 
as MatLab [8]. As an example, networks can be represented as a weighted adjacency 
matrix, with edges represented by matrix values greater than zero. A desired algorithm. 
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such as the Bellman-Ford algorithm [9], ean be exeeuted on the matrix and the results 
obtained relatively easily. Though these methods ean be simply implemented, many real- 
world network eonsiderations, sueh as routing polieies and implementation details, may be 
removed by sueh an abstraeted representation. For example, without added eonfiguration 
and eomplexity, the elFeets of paeket loss due to interferenee or eongestion will be missing 
from a simulated environment. Adding this nuaneed behavior into a mathematieal model 
is possible, but ean beeome eumbersome or overly eomplex as additional layers of a stack 
are ineluded. 

As an extension to mathematieal modeling, speeially designed network simulators ean 
provide granularity to seenarios where purely analytieal methods may overly simplify; or 
ean be used to validate analytieal models [10]. These tools have pre-developed layers and 
models whieh ean be used by a developer to more fully flesh the behaviors of the protoeol 
or platform being simulated. Tools sueh as ns-3 have been developed to provide a higher 
granularity in network simulation seenarios. ns-3 is a diserete-event network simulator 
and provides a simulation environment where researehers ean use pre-defined models in 
eombination with desired parameters to analyze network environments and seenarios [11]. 
A developer ean simulate and eustomize multiple layers of the network stack to provide a 
robust simulation environment Additionally, mueh of the platform is extensible, allowing 
users to thoroughly eustomize the seenarios developed [11]. Feng et al. developed an 
extension to the ns-3 family of software, ns-BGP, whieh was ereated in order to enable 
ns-2 to simulate BGP routing [12]. Multiple other efforts have been made at simulating 
inter-domain routing ineluding BSIM [13], BGPSIM [14], and an algorithmie approaeh 
detailed by Gill et al. [5]. Optimized Network Engineering Tool (OPNET) is another robust 
simulation tool and has been used to model a variety of layers and protoeols ineluding queue 
management [15] or eomparison of internal routing protoeols [16]. 

Some network simulators are sealable, and are able to simulate large network topologies, 
though they are often limited by some eombination of time and available resourees. Eor 
example, ns-3 has been evaluated for single-threaded performanee and, though efheient as 
eompared to other simulators, is still eonstrained by the available memory in a system [17]. 
In Weingartner et al.’s seenario [17], the researchers were able to simulate a 3000 node 
network which utilized approximately 60MB of memory and their results indicate a linear 
relationship between the number of nodes and memory required whieh plaees an upper limit 
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on the network size with regards to the available memory in a system. 

In order to increase scalability and remove the constraints of a single physical machine, 
advances to simulation systems have been developed to utilize parallelization methods. 
Research has been conducted on using message passing methods, specialized hardware, 
and customized routing protocols in conjunction with ns-3 in order to increase the scale 
and execution speed of simulated network topologies [18]-[21]. Using High Performance 
Computing (HPC) platforms researchers created simulated networks of 36,044, 800 nodes 
[20], 750,000 nodes [19], and 10^ nodes [21] respectively. These results demonstrate 
a significant increase in the scale of simulated networks by distributing resources across 
multiple physical hosts. 


2.1.2 Limitations of simulation software 

Simulation is often adequate for simulating simple characteristics of networks or protocols 
but may fall short in simulating a full implementation of a complex system. Inter-domain 
routing protocols are an example where simulation can be especially challenging. The 
BGP RFC itself is over 100 pages long. Accurately simulating all the prescribed behaviors 
defined within the BGP is a demanding task. Per the RFC, parts of the implementation, 
such as HoldTime [22], of the BGP standard are suggested rather than required. Therefore, 
we are unable to truly simulate a given proprietary “Black-box” vendor implementations of 
said standard. Additionally, much of true BGP behavior in real-world networks is the result 
of implementation specific differences, which adds further complexity to the simulation. 

Though simulation can be a robust method for virtualizing networks, simulation is truly 
an estimate, and can rarely capture the true behavior of real world networks [23]. Only 
the behaviors specifically defined within the simulation are executed, possibly removing 
undefined behaviors or second order effects. Simulating proprietary platforms presents 
further challenges. These systems may be a Black Box, where the developers of the 
simulation lack access to the actual implementations and can only observe the input and 
output. Rampfl et al. describe several instances of simulations providing high quality 
representations of reality where there are also notable differences in certain measurements 
and note the requirement for credibility and validation of simulation models [23]. The trust 
one can place on these simulations is then a function of the validation and testing done on 
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a model or simulation coupled with the amount of modifications made to the model for an 
individual instance. Any extension of the simulation or model also relies on the correctness 
of the underlying model [23]. 

Though a simulation may be accurate for a particular set of features or functionality, it may 
lack the complete feature set of a real world implementation. Packet tracer is a popular 
simulation tool often used in academic environments. It provides a simple, graphical 
interface for creating, interacting with, and analyzing networks and network traffic. Though 
simple to use, it lacks simulated features and behaviors implemented in real-world Cisco 
platforms such as Internal Border Gateway Protocol (IBGP) [24]. 

Simulation software also faces scalability limits as discussed earlier in section 2.1.1. Rep¬ 
resentation of nodes and traffic requires memory resources and both the execution and 
analysis of a simulated topology requires processor time. As the size of a topology grows, 
the memory required, time to execute, and processing requirements can increase. Therefore 
this constrains an administrator to either: decrease the size of the network to fit within the 
available physical resources, provide more physical resources to accommodate the size of 
the network, or potentially increase the execution time. 

2.1.3 Emulation software 

Emulating network devices is a useful method for obtaining near real world behavior without 
the complications involved with the procurement, installation, and management of physical 
hardware. Emulation software works by providing an abstraction layer between physical 
hardware and some software which expects to run on a different hardware platform. This 
allows the execution of said software on hardware not originally intended for this use. 
This is analogous to commonly used “Virtual Machines”, or virtual computers running real 
operating systems via hypervisors such as Virtualbox [25]. Additionally, emulation software 
provides the capability of interacting with actual network traffic by injecting real network 
traffic into the emulation environment, exporting generated traffic into a real network, or 
both. 

The software package Quagga [26] is popular choice for emulating routing on Unix systems. 
Quagga provides an abstraction layer between Unix hardware and Quagga clients. It supports 
many popular routing protocols and is extensible for custom administration [26]. 
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Graphical Network Simulator - 3 (GNS3) is a software suite which provides a graphical user 
interface for creating emulated networks [27]. In order to emulate Cisco devices, GNS3 
uses Dynamips, a tool created to emulate the Microprocessor without Interlocked Pipeline 
Stages (MIPS) instruction set [28]. Dynamips runs actual Cisco Internetwork Operating 
System (lOS) binaries which operates identically to the lOS running on physical hardware. 
None of the behavior controlled by software is abstracted or removed by an emulated router 
as opposed to a simulated instance. Dynamips provides an Application Programming 
Interface (API), which is used by the user interface of GNS-3 to create emulated Cisco 
devices in the background. While this is a powerful and useful tool for easily creating 
emulated networks, it is not designed for the automation of network creation and creates 
some constraints on the size of topologies which can be created. GNS3 uses a graphical 
user interface for the creation of a network and much of the work is done by clicking, 
dragging, and then manually configuring an emulated instance. This can be beneficial for 
small, simple networks and for less experienced administrators, but becomes tedious or 
impractical when trying to create many networks of many nodes. 

Autonetkit is another emulation platform that attempts to remove some of the limitations 
of tools like GNS3 [29]. AutoNetkit deploys an emulated network automatically, without 
required configurations, based on a provided model. It can automatically create required 
configuration files and build a fully functioning emulated network, with limited effort 
required by an administrator. AutoNetkit uses User-Mode Linux kernel and allows models 
to be inputted in several formats, such as GraphML, reducing the requirement for a specific 
user interface [29]. In order to scale the size of the emulated topologies, AutoNetkit 
uses Virtual Distributed Ethernet (VDE) to create virtual connections between physical 
devices and enable the deployment of a topology across multiple, physical boxes [29]. 
Additionally, reference [30] indicates that AutoNetKit can deploy to Dynamips, but upon 
testing the software, the authors were unable to evaluate this described functionality due to 
a lack of documentation. We downloaded, installed, and executed AutoNetKit for a given 
topology and analyzed the results but were unable to find the required scripts for deploying 
to Dynamips. The authors indicate that the software allows the use of multiple servers 
to run the topology but we were unable to locate publications on the methods, scale, or 
performance of such. 

Emulated Router Inference Kit (ERIK) was designed as a platform to create ground truth 
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emulated networks, which could be used to automate topology inference [28]. ERIK is 
written in Python and, similar to AutoNetkit, creates emulated topologies using a pre¬ 
created model as input [28]. Additionally, ERIK automates the processes of network 
modification and network measurement by leveraging additional capabilities of Dynamips 
and the inference software scamper [31]. ERIK takes in a model, creates the required 
Dynamips commands, executes the commands, and then modifies the network based on a 
set of pre-defined scenario scripts. Throughout the process, multiple vantage points (in the 
form of a virtual host running on the same server that changes virtual network attachment 
point) are used to measure the results of the network modifications. ERIK serves as the 
foundation for this thesis. 

2.1.4 Limitations of current emulation tools 

Emulation of network devices can be a resource intensive endeavor. Often, the emulated de¬ 
vice requires the full amount of memory normally available by the physical hardware which 
it expects, to be allocated continually, creating a hard limit on the number of emulations a 
physical device is capable of. Eor example a Cisco c7200 series router with an Network 
Processing Engine (NPE) NPE-400 has a default of 128Megabyte (MB) and is expandable 
up to 512MB for the NPE alone. The amount of memory used on a platform at any given 
time is a function of the lOS running, the number of type of network modules installed, 
and the amount of traffic processed. If the memory requirement of a router is not realized, 
they often do not fail gracefully, therefore creating a need for consideration in deployment. 
Routing can also be a processor intensive operation. Without careful allocation by the 
hypervisor, emulated devices can use continuous high levels of processing power from their 
physical host, quickly using the entire capacity of the physical host. Dynamips requires the 
manual configuration of an idle-pc value, which limits the amount of host processor used 
by the emulated device. 

At the time of writing, some existing emulation software expects the entirety of a network 
topology to be located on a single physical machine. Stephen Guppy, the CEO of GNS-3, 
indicates that a '"multi-tenancy feature is currently in development, but was not available 
at the time of writing [32]. ERIK creates execution scripts which are expected to be run on 
a single host and does not provide the interconnections to deploy on multiple hosts. The 
requirement to emulate the entire topology on a single host limits the number of emulated 
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instances which can be created. For example, Rye et al. used a physical server with 16, 
1.2Gigahertz (GHz) cores and nearly 99Gigabyte (GB) of memory; and was able to create 
a topology of 300 emulated Cisco routers before running into memory constraints [28]. 

Even if software allows a topology to be distributed across multiple hosts, it may not be 
entirely clear how to optimally distribute said topology. For example, minimizing the 
number of virtual links which traverse physical connections may be required, or minimizing 
the amount of traffic which traverse these links may be required. As the size and scale of 
the emulated network grows, these considerations will become more important to ensure 
that a topology can be emulated on a set of physical hosts without surpassing any of the 
resource capabilities of said hosts. 


2.2 Distribution Methods 

The ability to distribute resource requirements across a cluster of resources has been ap¬ 
plied in many fields of computer science. Parallel computing, concurrent computing, and 
distributed computing all have methodologies which can be applied toward scaling network 
virtualization. 

Fujimoto et al. [10] discuss two methods for scaling simulations: time parallel simulation 
and parallel discrete events. While time parallel simulation can decrease runtime, it does not 
allow the size of the network simulated to increase. In contrast, the parallel discrete events 
method, as described by the authors, allocates parts of a simulated topology to a processor. 
Thus, parallelizing discrete events is a preferred method for scaling the size of topologies 
as it allows both an increase in size, and a decrease in execution time [10]. Fujimoto et al. 
further discuss the use of multiple instances of a simulator working on disparate parts of a 
simulation, which they deem a “federated simulation” [10] and discuss the benefits of such. 
In these specific instances the authors need to account for specific simulation events, which 
will not be present in emulated environments as the software running on the emulation 
platform will manage these events as designed. 

Yocum et al. discuss “topology partitioning” [33], a method to distribute parts of a network 
topology across multiple physical resources. The methodology used by Yocum relies on 
techniques from graph theory graph partitioning including k-cluster [34] and METIS [35] 
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to distribute a topology across physical resources. The authors used the ModelNet [36] 
emulation platform to analyze their partitioning methods and their analysis shows that 
thoughtful partitioning can nearly double the capability versus random assignment [35]. 
The k-cluster method described returns equally sized connected components where all edges 
are considered equal and the METIS method returns similarly sized components but is able 
to use edge weights [33]. These methods are valuable for homogeneous environments, 
such as distributing load across identical cores of a host, but may not be as applicable for 
heterogeneous environments containing hosts of varying capabilities. In our methodology 
described in Chapter 3, we learn from these applications and attempt to apply some of the 
concepts to a heterogeneous environment. 


2.3 Linear Optimization 

As defined by Rardin [37], “Optimization models represent problem choices as decision 
variables and seek values that maximize or minimize objective functions of the decision 
variables subject to constraints on variable values expressing the limits on possible decision 
choices.” This process can be used to analyze a problem, construct a mathematical model 
of said problem, and potentially find an optimal solution. 

Linear optimization is a specific form of mathematical optimization where the requirements 
of the model can be represented in a linear relationship [37]. In a feasible linear program, 
one optimizes the objective function subject to parameters and constraints and achieves 
a feasible solution as good as any other feasible solution [37]. Such linear programs are 
expressed using the standard form: 


maximize c * x 
subject to Ax < b 
and .r > 0 

where .r is a vector of variables, c and b are vectors of coefficients, and A is a matrix of 
coefficients [37]. 
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Given these inputs, one of several methods, such as the Simplex algorithm [37] can be used to 
“solve” or find the optimal solution for the problem. Specialized solver software is usually 
used to solve non-trivial linear programs and is available via open source repositories. 
The computational INfrastructure for Operations Research (CoIN-OR) solver is one such 
implementation and was used throughout our experiments [38]. Additional free solver 
packages exists such as GNU Linear Programming Kit (GLPK) [39] and Gurobi [40], 
though these were not used during our testing. 

After a problem is discovered it needs to be properly fleshed out and formulated prior to 
implementing programmatically. Naval Postgraduate School (NPS) format is a popular 
method to describe a linear program [41]. In this format, a problem is expressed using the 
sections: indicies, data, decision variables, and formulation [41]. Indicies are used in a 
similar fashion to other fields and indicates a specific instance of a variable. Data is problem 
specific input. Decision variables are the variables which the problem is attempted to solve 
for. The formulation is the actual problem, expressed as a mathematical model, using the 
other sections indicated [41]. In Chapter 3 we use this method to describe our problem as 
it provides a clear written representation of the problem which can then be implemented in 
whichever programmatic language is desired. 

Linear programming is commonly used to analyze problems in routing such as the minimum 
cost network flow problem. In this specific instance, a linear program determines the 
minimum cost to traverse from a source to a sink. The network can be expressed as a 
weighted digraph, where links are represented by arcs and the weight indicates a cost of 
traverse the arc [37]. This example can be represented by a graph as shown in Figure 2.1. 
Using a linear program provides a benefit over traditional network path selection algorithms, 
such as Bellman-Ford [9], as it provides an opportunity to provide additional constraints or 
input to a problem. 


2.4 Challenges in Modeling BGP 

BGP is the prevalent protocol for routing traffic between Internet domains and has been 
widely used for over 20 years. The original BGP RFC was created in 1994 and has since 
been revised three times, with the latest RFC 4271, containing another six updates [22]. 
Despite a long and complex standard, due to its widespread use as the inter-domain routing 
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Figure 2.1: A linear program can find an optimal solution for the minimum 
cost path from the source to the sink. In this case the path would be: source 
- A - D - sink, for a total cost of 8. 


protocol for the Internet, simulating and emulating BGP is both a challenging and useful 
endeavor. 

One aspect which makes BGP simulation and emulation challenging is the common practice 
of implementation specific policies which can override default behavior. Much of the 
behavior of the inter-domain traffic within the Internet is derived via business agreements 
between the organizations who route traffic between their ASs. For example, an organization 
who is providing Internet service may choose to route traffic over a path with which it 
receives a higher monetary rate, rather than (following BGP default behavior), a path which 
may have fewer AS hops. 

The sheer size of the Internet is another contributing factor to the challenges of modeling 
BGP. At the time of writing. Center for Applied Internet Data Analysis (CAIDA) indicates 
that there are 54130 ASs with 22137 observed relationships [42]. From an emulation 
perspective this poses two challenges: the platform must have the capability to emulate 
many devices and it must have the memory and processing capability to support large 
required routing tables for all devices. Additionally, as Gill et al. discuss, discovering a 
ground truth is challenging due to both factors mentioned above: the large size of the 
Internet and proprietary implementations of BGP which are not released publicly [5]. 

In Chapter 3 we discuss the methods we use to enable an emulated, ground truth network 
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to provide representative behavior for inter-domain routing networks. We build upon 
existing software platforms and use the principles described in this chapter to build emulated 
topologies that scale to larger sized networks than previously possible. 
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CHAPTER 3: 

Distributed Emulation Methodology 


We recognize the limitations of existing emulation platforms as discussed in Chapter 2, and 
endeavor to create our own software platform, the Distributed Emulated Router Inference 
KitDERIK. DERIK is intended to facilitate the creation of large scale emulated network 
topologies. We aim to: 

1. Automate the creation of an emulated topology using a model as input into the system 

2. Efficiently distribute the emulated topology across the available physical resources 

3. Provide an extensible platform for application development where researchers, teach¬ 
ers, or practitioners can create and analyze test scenarios using the virtualized topology 

Toward these ends, we extended an existing tool. Emulated Router Inference Kit ERIK [28]. 
While ERIK has similar high-level goals and utilizes a similar implementation, it is confined 
to performing the emulation on a single physical machine. We therefore seek to modify 
ERIK to enable a single, emulated topology on multiple physical servers. We discuss the 
specific features provided by ERIK, the current state of DERIK, and the methods we used 
to efficiently distribute the topology across multiple, physical resources. 


3.1 Differences in DERIK from ERIK 

• Supports connections between physical servers 

• Efficiently allocates emulated devices to physical servers 

• Supports templates for scripts and configurations 

The work done by Rye in the development of ERIK provided a starting point for automating 
network topology creation, but still constrained the emulated network to a single physical 
device. Though emulated devices can communicate between instances of a Dynamips 
hypervisor, ERIK requires each hypervisor to be located on the same physical host as there 
is no automatic method for forwarding traffic between physical hosts. Eigure 3.1 depicts the 
structure created by ERIK. 
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Physical host 



Figure 3.1: ERIK allows interconnections between hypervisors but does not 
support interconnections between physical hosts. 


Verify hypervisor capability using manual processes 

Our first task was to ensure that the software ERIK used, specifically Dynamips, was 
capable of communicating across a physical link. In order to simply verify this, we chose 
to use ERIK and manual deployment to place a small network on two different physical 
servers. Then, we would manually configure the physical hosts to forward necessary traffic 
and create the logical connections required by the emulated devices. This desired system 
structure is depicted in Eigure 3.2. 

ERIK requires only the installation of Python2.7 and a Dynamips binary on the physical 
server where it will be executed. In order to automate the required interconnections between 
servers, DERIK requires additional packages to be installed on the physical server including: 
bridge [43], TunTap [44], expect [45], and Tine [46]. The bridge, TunTap, and Tine packages 
are used during the creation of virtual interfaces on the physical server which allow the 
hypervisor to pass traffic between physical devices. The expect package is used to automate 
passing commands into the telnet sessions used by Dynamips. Additionally, we use a local 
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Figure 3.2: A logical link is required between hypervisors which traverses a 
physical link between hosts and allows the emulated devices to communicate 
as if they were physically connected. 


client (such as a user’s laptop) as the build and deployment source for the platform. This 
local client requires Python2.7 and the following Python packages: Fabric [47], Jinja2 [48], 
PuLP [49], ipaddress [50], and networkx [51]. Our physical infrastructure consisted of two 
physical servers running Fedora, and a local client running Windows 10. Server A had 40 
cores and approximately 396GB of memory while Server B had 72 cores and approximately 
462GB of memory. We built and deployed the topologies from a Microsoft Surface Pro 
4 with an Intel Core 17-665011 and 16GB of memory. The servers were connected to one 
another via a gigabit Ethernet switch and a single network interface on each. The local client 
was wirelessly connected to the network and accessed each server using Secure Shell (SSH) 
keys via a management interface on each server. 

We used ERIK to create the deployment scripts for a simple, two node network: two 
routers connected to one another and sharing a BGP adjacency. ERIK created all of the 
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required router configurations and Dynamips commands which we manually transferred 
and executed on each physical server. Dynamips natively supports the logical connection 
between an emulated device and its physical hosts using “tap bindings” as a way to bridge 
traffic between emulated devices and the physical host [52]. ERIK used this feature to 
connect emulated routers to a virtual host running on the same server. We chose to apply 
this same methodology for our interconnection between emulated routers on different hosts. 

We created a virtual interface called a “tap” interface [44] on each physical host and bound 
said tap interface to a port on the emulated device via a Dynamips command. We then 
created a virtual network bridge [53] and Virtual Private Network (VPN) on each physical 
device using the software suite Tine [46]. Tine allows the creation of a VPN between two 
devices using a set of configuration files on each and a daemon running on each host [46]. 
By bridging the VPN with the tap interface, we were able to transfer traffic from the tap 
interface to the VPN, and then across the physical network between the two hosts as depicted 
in Figure 3.3. Tine encapsulates the Ethernet frame generated by the router into a User 
Datagram Protocol (UDP) packet which it then sends across a specific socket. The Tine 
daemon on the remote end is monitoring the same socket and decapsulates the packet before 
forwarding it to a specified interface. The two emulated hosts were able to communicate 
and unaware that the link between them traversed a VPN. 

Multiple inter-server connections 

Understanding that any non-trivial network which is distributed between multiple machines 
would require multiple interconnections, we needed to extend our configuration. In order 
to create N logical connections which all pass over a single physical link, we were required 
to create 2N virtual interfaces (bridge and tap for each) and N VPNs on each physical host. 
The creation of multiple tap interfaces was a single command executed on the host but Tine 
required: separate configuration directories for each VPN, distinct ports for each VPN, and 
multiple instances of the daemon in order to differentiate traffic destined for a specific VPN 
as demonstrated in Figure 3.4 

We deemed the use of encryption for the VPNs unnecessary as our physical network was 
secured behind a firewall and the traffic within our emulated network was limited to test 
data. Additionally, encryption would hinder our efforts at analyzing network traffic which 
traverses the physical interfaces. Therefore, we removed the cipher and digest selections 
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Figure 3.3: We create a tap, bridge, and VPN interface on each host and 
used the hypervisor to bind the tap to the emulated device. This created a 
logical link between interfaces on the emulated devices. 

from our Tine configurations in order to reduce overhead and complexity [46]. 

Adding or removing encryption in a particular instance of a topology is a trivial task and 
requires the modification of only three lines of code. We describe our use of templates 
in Section 3.2.4 and all Tine configuration scripts are built from a template. Therefore in 
order to add encryption, removing three lines from the template will, by default, enable 
encryption. 

Another requirement which differed from the original design of ERIK was the generation 
of scripts used for automation. The script generation methods used by ERIK were clear and 
simply designed but did not extend well to the requirements of multiple servers and larger 
topologies. We determined early that different design patterns were required to accomplish 
the distribution of devices and required a different underlying code structure. 

3.2 DERIK 

DERIK originated as an extension of ERIK. Though many of the concepts used in ERIK 
are also used here, the software structure is entirely different for DERIK and is depicted in 
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Figure 3.4: We manually configure Tine to create multiple VPN connections 
between hosts for each link between emulated routers on different hosts 


Figure 3.5. We segmented the topology creation process into five stages: 

1. Creation of an abstract graph object: We use a custom graph class to define an abstract 
graph object and map a pre-made model. 

2. Creation of an abstract emulated topology object: We define a topology as a set of 
devices and links and map a graph to said topology. 

3. Distribution of emulated device objects to physical server objects: We use several 
methods to allocate emulated devices to physical hosts. 

4. Creation of build scripts: we create configuration files for the emulated devices and 
scripts to automate interface creation and process execution for the physical hosts 

5. Execution of build scripts: we use the local client to issue the commands required to 
execute the created scripts on the servers 

We attempted to use as many software development best practices as possible during the 
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Figure 3.5: Builder classes take input and create new objects. The deployer 
class executes the created scripts on the remote servers. 


creation of DERIK such as abstraction, modularity, extensibility, and eode reuse. For 
example, in the instantiation of new deviee objeets, we use the Faetory Pattern [54] to 
dynamieally ereate applieable deviee objeets using a deviee faetory. When the topology 
builder elass maps a vertex, it eheeks the vertex metadata and reeognizes the vertex represents 
a Ciseo C7200 series router. The builder uses a generie Deviee Faetory (instead of a speeial 
Ciseo Deviee Faetory) to ereate a new deviee. The faetory understands that a new Ciseo 
C7200 router is desired and ereates sueh. Deviee objeets inherit general attributes from 
a generie grandparent elass, a more speeifie parent elass, and then are instantiated as a 
speeifie model. For example, a Ciseo C7200 series router inherits from a Ciseo deviee, 
whieh inherits from a generie deviee, which inherits from an Autonomous System object. 
This greatly redueed the amount of repeated eode and simplified the design of eonfiguration 
generation. These design ehoiees also improve the extensibility of the eode, enabling the 


23 























addition of other makes and models of devices in the future. If desired, another developer 
could simply add a Cisco 3700 series class, which inherits from the Cisco parent, and add 
the minor changes which differentiate it from a 7200 series. Additionally, we attempted 
to write highly granular, testable code and created tests which verify correctness for small 
blocks of code throughout much of the package. 


3.2.1 Creation of a graph object 

The graph creation functionality of DERIK provides an abstraction between third party 
packages and the emulated topology object. ERIK relied on the NetworkX [51] Python 
Package for model development and graph manipulation [28] and we also use NetworkX 
for the same purposes. In order to more loosely couple the topology building process to the 
model, we added the additional level of abstraction in the form of a graph object. This will 
provide flexibility if future users choose to use another package besides NetworkX. 

A DERIK graph is modeled after a standard mathematical graph, defined as a set of vertices 
and edges: 

G = (V,E) 

where each vertex and edge is also an object. In the model, a vertex is representative of 
a router, and an edge is representative of a link connecting two routers. The graph itself, 
each vertex, and each edge include attributes which create a set of metadata about the entire 
graph object. Eigure 3.6 is a Unified Modeling Eanguage (UME) diagram of the graph, 
edge, and vertex classes. This UME diagram depicts the attributes which are inherited from 
the parent class, and additional attributes within the child classes which extend the parent. 

DERIK’s graph builder functionality takes a graph model as input in Gephi, GraphME, or 
GME file formats and uses the NetworkX Python package to read and construct a NetworkX 
graph object. It then maps the NetworkX graph object into a generic DERIK graph object 
and computes some additional derived attributes of the graph which can be later used for 
both graph and network analysis. The vertex and edge object also contain specific device 
or link data allowing network information such as AS number to be added within a model 
instead of created by the system. 
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Figure 3.6: We consider a composition relationship between edges and nodes 
as they do not exists on their own, but exist within a graph. 


3.2.2 Creating an emulated topology object 

We chose to add another layer of abstraction into the system in order to simplify the 
distribution of a topology and increase the extensibility of the system. The topology builder 
class takes a DERIK graph object as input and maps the graph into an emulated topology 
object. For example, the topology builder class reads each vertex object and creates a device 
object based on the attributes of the vertex. There is a one-to-one correspondence between 
a vertex and a router, with a link between routers where there is an edge between devices. 
Figure 3.7 is a partial UMF diagram demonstrating the relationship similarities. 

By adding this additional level of abstraction, it allowed us to dynamically create configu¬ 
ration scripts for the device and hypervisor command scripts in a loosely coupled fashion. 
For example, in a real-world network two routers can connect to one another even if one 
is a Juniper router and one is a Cisco router. Both have interfaces, Internet Protocol (IP) 
addresses, and may have access lists; but the configuration syntax required to administer the 
routers may be quite different. By abstracting these physical devices into abstract device 
objects and separating them from the abstract graph model, we add flexibility to both our 
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Figure 3.7: We consider a composition relationship between links and device 
as they do not exists on their own, but exist within a topology. 

model input, and simplify the ereation of eonfigurations and build seripts later. In our 
implementation, we had our deviee elass inherit from an Autonomous System elass. This 
method allowed us to easily eonduet BGP relationship analysis and apply it towards eaeh 
deviee during the ereation of deviee eonfiguration files. Using this pattern allowed us to 
more easily apply desired routing polieies whieh we will diseuss in more detail in Seetion 
3.2.4. Figure 3.8 shows the inheritanee we used to simplify the instantiation of deviee 
objects. 

In addition to mapping a graph, the emulated topology builder also creates BGP relationships 
within the topology based on the metadata provided from the model. This derived metadata 
is generic enough to transcend make and model, and is used later in the configuration 
building step. The overall software flow is depicted in Figure 3.5. 

3.2.3 Distributing 

In order to distribute the emulated topology, the distribution class takes in the emulated 
topology, available resources (as an infrastructure object), and the distribution objective as 
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Figure 3.8: A device inherits from the Autonomous system class and specific 
makes and models of devices inherit further. Some methods are overloaded 
such as the “add-slots” method, and are specific to a model of device. 


parameters. An infrastructure object is a representation of the available physical resources. 
For example, an infrastructure may consist of 20 servers, each of which is capable of 
supporting 10 emulated routers and physically connected by a physical switch. This is then 
modeled as a list of server objects with distinct attributes. The distribution calculation 
parameter indicates which method of distribution to use: uniform, heuristic, or min-load. 
The distributor then allocates emulated devices to physical servers and returns a modified 
infrastructure object, where the server attributes have been updated to indicate which devices 
will be allocated to each server. 


Uniform distribution 

We first considered a uniform distribution where devices are allocated to each server in a 
“round-robin” fashion. In this method, each server would be allocated one device before 
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any server is allocated a second device. This pattern continues until all devices have been 
allocated, resulting in a distribution where all servers have an equal or nearly equal number 
of devices. Though a uniform distribution is simple and quickly calculated, ideally we 
desire to allocate the devices in as efficient manner possible for some given metric. 

When the developers of ERIK placed a network on a server where the amount of memory 
allocated to the routers exceeded the amount of memory available on the physical server, 
the emulated devices began failing and behaving in unexpected manners. This behavior 
indicated that the bottleneck in the distribution would be the available memory on a given 
physical host. To overcome this bottleneck, we simply input an attribute of “capability” 
for each physical host into the infrastructure model where capability is a number indicating 
how many emulated devices we expect a server can accommodate. For instance, we initially 
allocated 512MB of memory to each device. If a server has 2048MB of free memory, we 
would input a capability of 4.(4 * 512 = 2048) for the server. This is a manual indicator of 
the capacity of the machine and prevents the distributor from allocating more devices than 
the host is capable of supporting. In our cases we assume identical processing and memory 
requirements for each emulated router which reduces the complexity for distribution with 
regards to these characteristics. If one were to create a model where each individual router 
can have different memory and processing requirements, techniques such as “Bin Packing” 
could be an applicable alternative [55]. 


Optimal min-load distribution 

We determined that the next bottleneck could be the physical interface on a physical host and 
a uniform distribution does not account for limits on the physical links between machines. 
With a large network, there is potential for many VPNs to traverse a single physical link, 
and therefore the potential to saturate the physical link, thereby impacting the emulated 
network. We therefore consider distributions of devices that optimize some criteria, such 
as: minimize the number of required virtual interfaces or minimize the amount of traffic 
across the physical link VPNs. We decided that generating a traffic model for a specific 
network was not within scope of this project and instead decided to use the metric of “edge 
betweenness centrality” as an indicator of potential traffic load. However, if a specific traffic 
model is known, it can be added to the network model as traffic load and used in place of our 
chosen edge betweenness. Rye et al. also used betweenness-centrality as a measurement to 
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determine key links for fault analysis [28]. In referenees [28], [51], [56] link betweenness 
is defined as: 


b{ei) = ^ 

i,?eV(G) 


o-{s,t\ei) 
cr(5, t) 


where cr{s, t) is a geodesie in a graph G between vertiees s t, and cr(s, t|e,) is a geodesie 
between s t eontaining the edge e,-. 

For example, eonsider the graph G depleted in Figure 3.9. If we have two available physieal 
servers, eaeh eapable of supporting three emulated deviees, an alloeation for minimizing 
the sum of the load aeross the physieal link between servers is depleted in Figure 3.10. If 
eaeh router sends a single paeket to every other router, we observe a total of 30 paekets sent, 
18 of whieh traverse the physieal link. Alternatively, if we seleet a random distribution as 
depleted in Figure 3.11 we would have a total 35 paekets traversing the physieal link. This 
simple example illustrates the importanee of thoughtful alloeation in a potentially resouree 
eonstrained environment. 

The method we developed for optimizing alloeation was through the use of a linear program, 
speeifioally a mixed integer program, expressed in NFS format as follows: 

Indicies 


r router r = 1,2, ...,R where R is the number of routers in the topology 
m maehine m = 1,2,..., M where M is the number of physieal maehines available 


Parameters 

Try The expeeted traffie between Router r and Router r' [B/s] 

MaXm The maximum number of routers maehine m is eapable of holding 
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Figure 3.9: A simple graph model of six nodes connected by five edges. This 
could be a small example of two tier one providers (nodes 1 and 2) and four 
customers (nodes 3, 4, 5, and 6). 



Figure 3.10: If load on the physical link is measured as the sum of the loads 
of the simulated links, then the “load” on the physical link is minimized. 


Decision Variables 

Rr^m Router r is on machine m [binary] 

Ar,r',m Routcrs r and r' are not both on machine m [binary] 

Constraints and objective function 


Minimize the amount of traffic between routers on different physical machines. 

Minimize E 

r,r',m 
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Figure 3.11: If load on the physical link is measured as the sum of the loads 
of the simulated links, then the “load” on the physical link is not minimized. 
A packet flowing from node 1 to node 6 will cross the physical link twice, 
despite both emulated devices running on the same physical machine. 


subject to: 


Cannot place more routers on a server than the server is capable of 
r — h'lax^, Vm 


Each router can only be on one machine 
^r,m — Ij 


Model an ‘and’ constraint for two routers on different physical machines 

— Rr,mi Vr, T ,171 
^r,r',m — (1 ~ Rr',m)i Vr, T ,171 
^r,r',m — ^r,m (1 Rr’,m) ^ ^ 


The program attempts to calculate the lowest sum, relative to all possible sums, of the load 
on physical interfaces. This is accomplished by multiplying the link load by the binary value 
indicating whether the devices incident to a given edge are on the same physical host, for 
all edges. Due to our use of Python for developing DERIK, our implementation consisted 
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of converting this program into Pythonic form using the PuLP paekage [49]. As data, 
we used our available server infrastrueture as deseribed in Seetion 3.1, emulated deviees, 
and generated a matrix from the link-betweenness values of our emulated topology. Our 
implementation first analyzes the server’s eapabilities and if the servers are ineapable of 
supporting the speeifie number of deviees, returns an error stating sueh. Additionally, if a 
single server ean support all the deviees, the solver immediately returns the trivial solution 
of all deviees alloeated to a single server. PuLP also installs the free CoIN-OR solver [49] 
by default whieh we then used to solve the linear program and provide an alloeation of 
deviees to servers. The results were stored as a Python dietionary, with the server id as a 
key, and a list of deviee ids as the value. This dietionary was then used to add a list of deviee 
objeets to eaeh speeified server’s attributes. As shown in Figure 3.5, this entire proeess is 
automated within DERIK. 

A limitation with this approaeh is the sheer size of the linear program. Though simple in its 
expression, as the number of deviees inereases, the number of potential solutions inereases 
exponentially. The problem itself is still linear, but the number of eonstraints inereases as 
a square of the number of deviees. This large size plaees a signifieant burden on the solver 
and inereases both the memory requirements of the host running the solver software and the 
time required to eomplete. The solver may be able to find a feasible integer solution, and 
find a lower bound (on a minimization problem) whieh is not an integer, but will eontinue 
to run until it finds the optimal solution whieh ean take hours or days depending on the 
size of the topology and the solver software used. Figure 3.12 depiets solves times using a 
Mierosoft Surfaee Pro 4 and the CoIN-OR solver. 

In an attempt to reduee the running time, we implemented another feature of linear program¬ 
ming, fraetional gaps, as a eonfigurable variable in the distribution method. Implementing a 
fraetional gap allows the solver to return a feasible solution whieh is within a pereentage of 
its known best possible answer, even if the optimal solution is not yet reaehed. This feature 
reduees the solve time but may return a sub-optimal, though feasible, solution. Additionally, 
it allows the solver to have a set time-limit for the solving proeess. Without the gap and 
time limit, a near-optimal solution may be found quiekly, but the solver will eontinue to run 
as long as neeessary to find the optimal solution. 
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Figure 3.12: As the size of the graph increases, there is an exponential trend 
in the time it takes to solve the problem. 

Heuristic distribution 

In order to provide another, faster option for large topologies, we also developed a heuristie 
approach to distribution. This approach has a significant runtime improvement with a time 
complexity of 0(n\ogn) where n is the number of edges in a graph G [57]. For our 
heuristic distribution, we simply sort the edges by the link-betweenness value and allocate 
both devices incident to said edge to a single server, until the max capacity of the server 
is reached. At which point, we begin allocating devices to the next server. This method is 
depicted in pseudo code in Algorithm 1. 

3.2.4 Building scripts 

Manually executing the creation and deployment of a topology becomes cumbersome as 
the size of the topology grows. We desired to automate the process as much as possible and 
developed DERIK to create a series of scripts which can be run automatically as part of 
the software flow. For example, if DERIK is deploying a topology to a Einux server, it will 
automatically create bash scripts that execute the commands to create tap interfaces and 
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Algorithm 1 Heuristic Distribution Pseudo code 
topology.links.sort(key = link_betweenness ) 
current_server_index = 0 

current_server = all_servers[current_server_index] 
for link in topology do 

for device incident to link do 

if current_server.num_devices_allocated > current_server.max_capability then 
current_server_index + = 1 
server = servers [current_server_index] 
server. add_device(node) 


bridges. These scripts are built from templates and require no additional user interaction 
once a model has been generated and input into DERIK. 

In order to minimize the dependency on a particular make or model of emulated router 
and to simplify the creation of device configuration scripts, we choose to use a templating 
engine, Jinjal in the creation of our build scripts and configuration files [48]. For example, 
the Script Builder takes a Device object (discussed in Section 3.2.2) as input and creates 
a configuration file based on a template for the particular make and model contained in 
the Device object’s attributes. We further extended this by using template inheritance and 
nested templates in order to minimize file Input-Output (10) and simplify the administration 
of templates. For example, many Cisco operating systems contain similar sections, syntax, 
and commands for a baseline router configuration so we create a base “Cisco” template. If 
a model “c7200” router has a slightly different syntax for access-list creation than model 
“Nexus 7000”, we can simply override the parent access-list template by creating a new 
“nexus-7000” directory containing a modified access-list template. The configuration 
builder analyzes a device object’s attributes, including make and model, and selects the 
appropriate templates to use for the generation of the configuration file. Each template 
contains fields which correspond to dynamic information required by the configuration but 
specific to each instance of a device. For example, a baseline configuration may contain the 
field “hostname”, which is then populated with the device’s attribute, “hostname”. 

The configuration builder class uses the templates, device object attributes, and derived data 
from the emulated topology to create the configuration files for each device. The derived 
data computed by the topology builder class is necessary for specific configurations such 


34 



as BGP neighbor commands. The authors of ERIK choose to implement some standard 
traffic shaping policies in BGP configuration such as preferring customer traffic over peer 
or provider; in order to generate these configurations automatically, some derived attributes 
needed to be calculated for relationships between nodes. One significant difference from 
ERIK we implemented is the separation of configurations into their own files instead of 
encoding them into base64 and pushing them via the hypervisor commands. As the number 
of links per router increases, the number of BGP neighbors increases and the total size of 
the configuration increases. Dynamips has a limit on the size of the message that can be 
sent via its console input and we quickly exceeded that size limit [52]. Instead, we used 
another Dynamips command vm set_config in order to provide a location for a configuration 
file stored on the server. 

We also use the templating methodology for the creation of scripts which will run on the 
physical hosts. Eor example, scripts which start and configure the hypervisor daemons and 
VPN daemons are dynamically created based on the type and number of physical hosts, 
emulated devices, and inter-connections required. All the scripts and configurations are 
then saved locally in order to provide test repeatability and for later analysis. 

3.2.5 Execution of build scripts 

The deployment of an emulated topology to the physical servers is executed from the local 
client and can be broken into four sequential steps, each dependent on the prior steps: 

1. Directory structure creation 

2. Transfer of scripts and configuration files 

3. Creation of virtual interfaces 

4. Creation of emulated devices 

In order to automate the execution of the scripts we used the Python package “Eabric” 
[47]. Eabric provides an interface for automating SSH functionality using the Python 
programming language. In our case we used it to automate the transfer of files created in 
section 3.2.4 to remote servers and the execution of commands on the remote servers. Eor 
example, as described in section 3.2.4 we create a script which sends console commands to 
a hypervisor. Eabric is used to push the script to the remote server and then automates the 
execution of the script at the appropriate time. 
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Dynamips uses a telnet session to receive the commands necessary to create or modify new 
devices. Automating this process proved challenging due to the “nested” sessions required, 
and we decided to use the UNIX “expect” utility to execute pre-created scripts. In order to 
parallelize the process as much as possible, we create separate scripts for each hypervisor 
which can be started in parallel by a bash script, which in turn is started as part of the 
deployment process by a remote command issued from Fabric. With the potential for many 
instances of hypervisors running on a server simultaneously, and each hypervisor script 
containing up to 1000 commands, this greatly reduced the run time for the deployment step 
versus running in sequence. 
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CHAPTER 4: 
Results and Analysis 


In this chapter we first discuss the results obtained creating emulated topologies using 
DERIK. We then detail the analysis we conducted on the methods used to distribute the 
topologies. Finally, we examine the execution of a BGP hijack scenario we conducted as an 
example of the utility of DERIK as an emulation platform for large topologies. 


4.1 Analysis of DERIK Topologies 

Our initial goal was to verify that DERIK was able to create a topology of over 1000 
emulated devices running on two physical servers. This would create an emulated network 
over three times larger than Rye et ah, and meet our objective of distributing the topology 
across physical machines. 

We first analyzed a topology created by DERIK using a random Barbasi-Albert model 
as input where each node represented a single router and single AS. As input into the 
NetworkX random graph generator, we provided an order of 1001 and an attachment value 
of 3, meaning each new vertex would preferentially attach to three random, existing vertices. 
The attachment value indicates how many edges are created between a node added to the 
graph, and the existing nodes. The NetworkX graph generator function takes an integer as 
the attachment value. The resulting graph is sensitive to this attachment parameter; minor 
changes can cause large differences in the total number of graph edges. Too few edges 
in the model and the resulting network was sparse and not representative of the Internet 
structure desired. Alternatively, a large number of edges within a graph created high degree 
nodes that we are unable to emulate due to restrictions on the total interface count on a 
single emulated router. For example, a Cisco 7200 series routers can support six network 
modules, each with eight interfaces, for up to 48 interfaces per device. Relaxing the one- 
to-one mapping of device to AS is a valuable exploration for future work and is discussed 
in Chapter 5. 

In order to reduce the size of the graph after creation, we iterated through each edge and 
randomly deleted each with a probability of .25 which resulted in a total size of 2,280 
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edges. In order to approximate the structure of the Internet, we divided our model into three 
tiers using similar parameters to those described by Rye et al. [28], though we were able 
to reduce the total percentage of Tier 1 nodes due to the larger size of the network. The 
topology was partitioned into 20 Tier 1, 300 Tier 2, and 681 Tier 3 ASs. 

To ensure that the model was representative of our desired tiered structure, we then analyzed 
the entire graph to ensure it met five characteristics: 

1. All Tier 1 nodes had at least one Tier 1 neighbor. 

2. All Tier 2 nodes had at least one Tier 1 neighbor. 

3. No Tier 3 peers existed. 

4. The maximum number of neighbors for any device did not exceed the maximum 
number of ports the device possessed. 

5. The graph consisted of a single connected component 

If any of these characteristics were not met, our model builder automatically created or 
deleted an edge to adjust. When deleting an edge, the model builder verified that the 
graph was not disconnected by the edge removal. If the graph was disconnected, the 
builder randomly selected a node from each component, verified that it was not breaking 

the requirements by adding a tier 3 peering, and added the edge. When adding an edge, 

it randomly selected two nodes which met the requirements (i.e., a tier one node with no 
other tier 1 neighbors, and then another tier 1 node), and added an edge between the two. 
The model builder then repeated the analysis until all requirements were met. The final, 
emulated network consisted of 1001 Cisco 7200 series routers, connected via /30 networks, 
each advertising a /24 network in BGP as an AS. This specific model is not meant to be 
representative of any particular network, but instead was meant to use a common model 
development method to test the limits of our platform. 

We created and deployed this network using DERIK and the heuristic method of distribution 
to two physical servers: 750 emulated routers to Server A, and 251 routers to Server B. The 
min-load distribution method for this size network took prohibitively long to execute; we 
defer using min-load on very large networks to future research. The heuristic distribution 
results in 20 Tier 1 nodes, 300 Tier 2 nodes, and 430 Tier 3 nodes on Server A; and 0 
Tier 1 nodes, 0 Tier 2 nodes, and 251 Tier 3 nodes on Server B. The physical hosts were 
connected to one another by a single physical gigabit Ethernet interface with a physical 
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switch between. The percentage capability of each machine was predetermined by using 
the available memory on the server used by Rye et al. in their 300 node test, and extrapolating 
upward in a linear measurement relative for our servers [28]. These capabilities were relative 
to one another and the size of the topology, not the max capability of the machines. We will 
discuss the server’s full capabilities in our next example. The resulting distribution created 
617 VPNs (and the applicable virtual interfaces) on each machine. 

The total process completed in less than 15 minutes with over 99% of the time used for 
execution of the remote commands to each of the servers. Much of the time required is 
due to the number of individual commands run to create the virtual interfaces. Due to the 
security configuration on our servers, we were required to execute each command remotely. 
This process could be expedited by creating and executing a single script including all 
commands, on each server. This was completed on a Microsoft Surface Pro 4, connected 
via a wireless connection to the network. 

To verify successful deployment, we conducted both manual analysis and automated analy¬ 
sis. We first established a remote session with each of the physical servers and verified that 
the Dynamips and Tine processes were running. We then consoled into a randomly chosen 
router and viewed the BGP routes to verify that the route table was populated. Alternatively, 
in Section 4.3 we discuss connecting the topology to the BIRD daemon which provides 
an automated way to view the number of BGP routes learned by each device and can be 
verified against the number of expected routes. Finally, we conducted some traffic analysis 
as described in Section 4.2 to verify that traffic was successfully traversing the physical 
interfaces. 


4.2 Analysis of Optimization 

In order to test the efficiency of our distribution methods we devised a simple traffic scenario 
to determine the amount of traffic traversing the physical interfaces of our physical hosts. We 
used tepdump to monitor and record packets traversing the physical interface on each server. 
Tine uses pre-defined ports for each of the VPN connections and we applied tepdump filters 
to only capture packets on the ports configured for the VPNs. We chose to use Internet 
Control Message Protocol (ICMP) traffic rather than something more complex, such as a 
file transfer, to establish a lower bound and to ensure we did not immediately overwhelm 
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any point in the paths. Each ICMP query and reply also contains a sequence number, which 
simplifies analysis of loss. 

We developed a simple script in which each emulated device pinged every other device 
using the “send_con_msg” command in the Dynamips hypervisor. The script was built 
as part of the script building process described in section 3.2.4, pushed to the respective 
servers during deployment, and then executed manually via SSH. We chose to send five, 
100 byte requests; and randomized the order of the destination addresses for each device in 
order to prevent all devices from pinging a single destination simultaneously. Additionally, 
the script started each ping session with a delay of ^ seconds, where n is the number of 
sessions. This delay reduced simultaneous bursts of traffic throughout the scenario. An 
individual script was created for every hypervisor instance and run in parallel resulting 
in approximately y, where m is the number of devices, simultaneous ICMP requests and 
replies occurring on the servers at any given time. 

To compare optimization methods, we: i) created topologies of 50, 100, and 150 devices; 
ii) distributed the topology using a uniform (random) distribution, heuristic distribution, 
and min-load distribution; and iii) executed the traffic analysis methods described above 
on each individual scenario. We discovered a significant decrease in the amount of traffic 
observed on the physical link using either a heuristic distribution or a min-load distribution 
versus a random distribution. We chose not to conduct comparisons of topologies larger 
than 150 nodes as the linear program run time for the min-load method exceeded our testing 
time limits. Figure 4.1 depicts the results. 

Note that the uniform distribution caused more traffic to pass through the physical interfaces 
in all topologies analyzed. This is due to a larger number of paths traversing the physical 
interface multiple times, as depicted earlier in Figure 3.11. 

Additionally, we analyzed the captured traffic for loss to determine the amount of loss 
given a low traffic load scenario. As we originated the ground truth traffic, we knew 
how many source/sink ICMP packets should be traversing the link. We were thus able 
to programmatically analyze the capture to determine how many packets, and from which 
sources/destinations, were lost. For example, we know that five ICMP echo requests should 
exist for a (source, destination) tuple. We also know that five ICMP echo replies should exist 
for the reverse (destination, source) tuple. We tested on 50,100,150, and 300 node networks 
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Figure 4.1: The amount of trafFic is reduced using both the heuristic method 
or min-load method. 


and our captures indicated 0 packets lost. As we scaled up the size of our network, we began 
losing a small percentage of packets. For example, with a 1001 node network, we detected 
a loss of 88 of 6347020 packets or .00002% of packets. We had collected packets on the 
physical interfaces of both hosts and our captures contained identical results, indicating that 
the loss was not due to physical restrictions on the host, but rather to unresponsive emulated 
devices at some point in the path. Though we were unable to identify the root cause of 
the unresponsiveness, we speculate it may be caused by delayed route propagation; or the 
overwhelmed memory or processor of an emulated device; and believe it worth further 
research. In Section 5.1 we discuss some opportunities to address loss in these scenarios. 


4.3 Proof of Concept Usage 

We desired to demonstrate the utility of DERIK by executing a real-world, non-trivial 
application and analyzing the results to understand its effects. As stated in Section 2.4, 
BGP is a challenging protocol to simulate and we desired to demonstrate the value of an 
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emulation platform by developing a BGP hijaeking seenario. 

A BGP hijaek attaek is the false advertisement of a prefix whieh is aetually owned by another 
AS [58]. This ean oeeur on aeeident by miseonfiguration, or by an adversary for malieious 
purposes. These attaeks, whether aeeidental or nefarious, ean eause serious effeets on 
Internet availability. A hijaek eould take multiple forms as indieated in [58], sueh a falsely 
advertising a eonneetion to another AS or falsely advertising a prefix it does not own. We 
ehose to exeeute an attaek where the hijacking AS falsely advertises a prefix owned by the 
hijacked AS. 

Using an emulated topology to exeeute this attaek seenario provides several interesting 
insights. Implementing polieies on eaeh router (and therefore eaeh AS) using aetual Ciseo 
eonfigurations, allows us to observe how these eonfiguration ehanges direetly impaet the 
effeets of a hijaek. These ehanges will direetly affeet the pereentage polluted, rate of 
pollution, and parts of the topology polluted due to filtering of the routes advertised between 
neighbors. The differenees between a topology where no polieies are implemented, and 
a topology eontaining poliey ean provide insight which may be missing from a simulated 
environment. 

To conduct this scenario, we slightly extended DERIK to allocate an unused interface on 
each of the individual routers and bind it to a BIRD [59] daemon running on the host using 
Tap and Bridge interfaces. Binding to unused interfaces is one way in which we anticipate 
users extending and connecting to a DERIK emulated network. 

BIRD is a routing daemon which can import and export BGP routes when connected to 
another emulated router, or to the kernel of a Einux machine [59]. We extended DERIK 
to automate the creation of a BIRD configuration file to allow it to receive BGP updates, 
but not advertise any prefixes of its own. DERIK then automatically configured a BGP 
neighbor pairing for each device and the AS advertised by the BIRD configuration. By 
doing this we created a simple “looking glass” for every device in the topology using a 
single daemon on the host. Eigure 4.2 depicts the structure we used. BIRD provides the 
added benefit of logging to a Multi-Threaded Routing Toolkit (MRT) dump file, which we 
could analyze using a custom Python script. 

We then used the same “send_con_msg” feature of Dynamips to enable us to automate 
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Figure 4.2: BIRD monitors the Bridge interfaces which allow it to receive 
BGP updates as a neighbor of the emulated routers. Each device is bound 
to a unique bridge interface and BIRD pairs using the required address and 
AS number as configured. 


scenarios via a script such that different prefixes are hijacked at various iocations within 
the topoiogy. For exampie, a Tier 1 node couid hijack a prefix already advertised by a Tier 
2 node. For this scenario, we randomly selected a Tier 1 node as the attacker and a Tier 2 
node as the victim. By modifying the configuration of the Tier 1 node to advertise the prefix 
currently advertised by the Tier 2 node, the topology was polluted and some other nodes 
now believed the Tier 1 node owned the prefix. For our purposes, the random selection of 
nodes was sufficient for demonstration, but the precise selection based on characteristics of 
the topology or nodes is another valuable method that could be considered in the future. 
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We executed a single hijack after waiting for the BGP paths to fully converge. The BIRD 
daemon provides an interface where we were able to see the number of advertisements 
provided by each node. Once the count reached the expected value, we knew the paths were 
converged. We then executed the attack and waited for the results of the hijack to converge 
using the same methods above. Once the polluted paths had converged, we stopped the 
BIRD daemon and saved the dump file for analysis. After removing the hijacked prefix, 
we allowed the network to re-converge before executing the next scenario. We analyzed 
the dump file by modifying scripts created by Jon Oberheide [60] which utilized the dpkt 
package [61]. The pseudo code outlining our method to find a polluted path update is shown 
in Algorithm 2. 

We created a small topology of 50 nodes using the model building method described in 
Section 4.1. We chose a smaller topology to demonstrate the extensibility of DERIK and 
for ease of analysis. This resulted in five Tier 1 nodes, 15 Tier 2 nodes, and 30 Tier 3 nodes. 
Our topology was built using templates which included the policies described previously 
in Section 3.2.4. For example, a provider prefers customer traffic over peer traffic. We 
executed nine scenarios using the methods described earlier such as a Tier 1 node hijacks 
a prefix owned by another Tier 1 node, then a Tier 2 node, etc. We analyzed the dump 
files and charted the effectiveness of a node within a particular tier hijacking nodes within 
each other tier. The results are displayed in Figure 4.3. Additionally, the updates from 
the dump file include a time stamp so we were able to analyze timing information, such as 
convergence time, for a particular scenario. 

Using DERIK to conduct these scenarios provided us with several benefits: 

1. We were able to conduct the hijack using the actual behavior of Cisco routers running 
BGP and were not required to develop any special simulations. 

2. We were able to configure the routers running BGP using the same configurations used 
by actual routers. For example, using a route-map to prefer traffic from customers 
over peers. 

3. We were able to conduct relative time comparisons between BGP updates and deter¬ 
mine the time taken for hijacks to converge. 

4. We discovered that due to our implemented policies, certain hijacks had less effect 
on the flow of traffic. 
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Algorithm 2 Find polluted paths 
paths = [] 

for update in updates do 

if update .prefix == attack_prefix then 
paths.append(path) 
true = [] 
polluted = [] 
for path in paths do 

if last_as_in_path == attacker _as then 
polluted. append(path) 

else 

true.append(path) 
polluted_ases = set() 
for polluted_path in polluted do 
for true_path in true do 

if first_as_in_polluted_path == first_as_in_true_path then 

if len(first_as_in_polluted_path) < len(first_as_in_true_path) then 
polluted_ases. add(first_as_in_polluted_path) 


Consider the example graph depicted in Figure 4.4. Using no policies, if node 10 hijacks 
a prefix owned by node 1, both nodes 11 and 12 would be polluted due to a new shortest 
path advertisement from the hijacking node 10. The common policies implemented in our 
network prevent this from occurring by filtering route advertisements between Tier 2 peers. 
Therefore, if node 10 hijacks a prefix owned by node 1, neither nodes 11 nor 12 would be 
polluted due to a preferred path via their providers. These types of commonly implemented 
policies impact the behavior of many models and demonstrate the value of emulation. 
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Figure 4.3: The x axis indicates which tier attacked (or hijacked) and each 
bar represents the victim tier. Note that the nodes we tested were ineffective 
in hijacking nodes within a lower tier. 



Figure 4.4: Nodes 1, 2, and 3 are Tier 1 nodes. Nodes 10, 11, 12, and 13. 
The standard behavior of BGP is overridden by policy and therefore impacts 
the effects of a hijack. 
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CHAPTER 5: 

Conclusions and Future Work 


In this thesis we extended the ERIK platform in order to: 

1. Automate the creation of an emulated network topology across multiple physical 
servers 

2. Efficiently allocate emulated devices to minimize traffic between physical devices 
The resulting system, DERIK, uses the following methodology: 

1. Accept a network model as input 

2. Analyze and distribute the topology to physical devices 

3. Generate the automation scripts and configuration files 

4. Deploy the topology from a local client to the physical infrastructure 

Once the emulated topology is created, it can be utilized for arbitrary test scenarios of 
non-trivial topologies. In addition, we show how test scenarios can interact with the 
applications on the servers, connect to other Virtual Machines (VMs) running on the server, 
or pass modification messages via the hypervisor. 

While we do not explore the upper-bounds of the topology size that DERIK can emulate, 
this thesis creates a topology of over 1,000 emulated routers (approximately three times 
larger than previously achieved by ERIK). 

A primary contribution of this thesis is exploring the means by which emulated resources 
are distributed. Chapter 4 discusses the results of our optimization routines. We see a 
reduction in traffic between physical servers using either the heuristic approach or the min- 
load method versus a uniform allocation. These results indicate that simple consideration of 
placement will allow the system to best accommodate input topologies for a given physical 
infrastructure. 

Additionally, we examine the benefits of emulation by creating a BGP hijack attack on 
our emulated environment. The execution of this experiment, as explained in Chapter 4, 
demonstrates that behavior is present in an emulated environment which may be missing. 
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or require additional complexity to implement, in a simulated environment. 


5.1 Future Work 

In this section, we detail our suggestions for work that builds on our results and analysis. 


Internal BGP 

In our models we represented each AS using a single router in order to maximize the 
total size of the AS-level topology created. This method places several constraints on our 
networks. First, as the size of the network increases, our models will be limited in the 
number of links connected to a single device and will therefore limit the connections of a 
single AS. Analysis of BGP adjacencies from the Internet [42] show that many ASs have 
many more adjacencies than can be supported by a single physical device, and in order to 
model these, multiple routers per AS are required. DERIK currently only supports External 
Border Gateway Protocol (eBGP) neighbors and configuration for iBGP would be required 
for these models. We believe that adding this capability would allow the exploration of even 
larger topologies or models with larger degrees per AS. 


Mixed device topologies 

While our results from Chapter 4 were obtained using a network of entirely Cisco devices, 
additional emulation tools such as Qemu and Quagga exist and are widely used. We 
considered extensibility throughout the development process and simplified the addition of 
new platforms to the software, but did not implement devices other than Cisco models. Many 
real world networks are very heterogeneous and it would be beneficial to add the capability 
to create topologies consisting of multiple vendors, in order to analyze differences between 
vendor implementations of protocols. 

During our distribution methods we consider all devices to require an equal amount of 
memory and processing. As discussed in Section 3.2.3, distribution of devices with unique 
memory and processing requirements is possible with some minor modifications. This 
could prove valuable for customizing specific nodes within a topology. Eor example, a Tier 
1 node with a high degree could be allocated more memory within the hypervisor, or be 
configured to allocate more memory towards 10 to account for an expected higher load. 
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Commercial solver software 

Our attempts at linear optimization were completed using freely available solver software 
which may not be the ideal platform for a specific problem. Paid solver software such 
as IBM’s CPLEX software [62] exists and may be beneficial for solving the min-load on 
larger topologies. Though the heuristic method performed very well, a guaranteed optimal 
solution for min-load may be desired or analysis of the linear program performance using 
alternative solvers would be beneficial. 

Further experimental scenarios 

We believe there are many opportunities for further scenario testing using DERIK. Eor ex¬ 
ample, the Department of Defense (DoD) and its subordinate agencies operate and control 
very large networks. Emulating portions of these networks before applying configuration 
or policy changes could prove valuable in optimizing performance and minimizing down¬ 
time. Our use of templates could also provide a benefit to the administration of tactical 
type networks. Currently, the use of a “baseline” configuration for devices is used to pro¬ 
vide required elements such as access-lists. Then, much of the remaining configuration is 
completed manually to fit a defined network model. Automated generation of configura¬ 
tions using templates could reduce preparation time and time required for troubleshooting 
configuration errors. 

As demonstrated by Rye et ah, analysis of measurement tools is another valuable research 
opportunity. New and innovative measurement tools, such as BGPStream [63], are devel¬ 
oped but require testing and validation before they are employed in the real world. We 
believe DERIK is a useful platform for creating a ground truth network to test such tools 
against. 
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APPENDIX: [Python Code for Linear Program 


@ staticmethod 

def _min_load_distribution ( emulated_topology , servers): 

distribution = {} 

TOLERANCE = .00001 

num_servers = len(servers) 

num_devices = len ( emulated_topology . devices ) 

# create a mapping of position vs id 

server_mapping = {idx: server . id for (idx, server) in enumerate ( servers ) } 
device_mapping = {idx: device . id for (idx, device) in enumerate (emulated_topology . 
devices)} 

server_idxs = range ( num_servers ) 
device_idxs = range ( num_devices ) 

traffic_matrix = Distributor . _populate_traffic_matrix ( emulated_topology . links , 
num_devices) 

max_num_devices = [ server . max_devices for server in servers] 

if sum( max_num_devices ) < num_devices : 

raise RuntimeError(" Theu serversucannotusupportuthis uHianyu device s ") 

elif max( max_num_devices ) >= num_devices : 

warnings . warn ("AllutheudeviceSuWillugOuOnuOneuServer ... ") 

setup_time_start = time.time() 

allocated = LpVariable . diets (" Allocated" , [(s, d) for s in server_idxs 

for d in device_idxs ] , 0, 1, LpBinary) 

server_cost = LpVariable . diets (" ServerCost" , [s for s in server_idxs ] , lowBound=0) 

inter_server = Lp Variable . di c t s (" Inter - S erver " , [(s, dl , d2) 

for s in server_idxs 
for dl in device_idxs 
for d2 in range(dl, len( 
de vice_idxs))], 0, 1, 

LpBinary) 

problem = LpProblem (" Router Allocation " , LpMinimize) 
for d in device_idxs : 

problem += lpSum( allocated [(s , d)] for s in server_idxs) == 1 
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for s in server_idxs: 

problem += lpSum( allocated [(s , d)] for d in device_idxs) <= max_num_devices [ s ] 

for s in server_idxs: 

for dl in device_idxs : 

for d2 in range(dl, len(device_idxs)): 

problem += inter_server [(s , dl , d2)] <= allocated [( s , dl)] 

problem += inter_server [(s , dl , d2)] <= (1 - allocated [(s , d2)]) 

problem += inter_server [(s , dl , d2)] >= \ 

allocated [( s , dl) ] + (1 - allocated [( s , d2)]) - 1 

problem += lpSum( server_cost [ s ] for s in server_idxs) >= -1 

problem += lpSum( inter_server [(s , dl , d2)] * traffic_matrix [dl ] [ d2] for s in 
server_idxs for dl in device_idxs 

for d2 in range(dl, len(device_idxs))) 

max_seconds = const .MAX_]VnNUTES * 60 

problem . solve (PULP_CBC_CMD(msg = l, maxSeconds = max_seconds , fracGap = const .MAX_GAP)) 

if str ( LpStatus [ problem . s t at u s ]) == "Infeasible": 

raise RuntimeError(" Distributionuinfeasible ") 
elif str (LpStatus [problem, status]) == "NotuSolved": 
if value ( problem . obj ecti ve ) <= 0: 

raise RuntimeError( "Nou feasibleusolutionufound") 

for s in server_idxs: 

if s not in distribution: 
distribution [s] = [] 

distribution [ s ] += [d for d in device_idxs 

if allocated [( s , d)].varValue > TOLERANCE] 


server_id_device_id = (} 

for idx , sever_id in server_mapping . iteritems () : 
server_id_device_id[sever_id] = [] 
for device_idx in distribution [ idx ]: 

server_id_device_id [sever_id ]. append (device_mapping [ device_idx ]) 

return server_id_device_id 
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